HealthData@EU Pilot - Sciensano Use Case : Population uptake metrics: COVID-19 test positivity, vaccination and hospitalization

Quality Analysis

Report version: v1.1

Overview

This section provides an overview of the imported dataset. Dataset statistics, variable types, a missing data profile and potential alerts are shown below.

Discrete variable 23
Continuous variable 4
All missing variable 0


exitus_dt has 90657 (90.7%) missing values Missing
dose_3_brand_cd has 90194 (90.2%) missing values Missing
dose_3_dt has 90230 (90.2%) missing values Missing
fully_vaccinated_dt has 91780 (91.8%) missing values Missing
The variable ‘person_id’ does not have all unique values Number of duplicate values: 4999 Not unique

Variables

This section provides more detailed information per variable in the imported dataset.

Class of the variable: character

More than 100 distinct values

More than 100 distinct values

Class of the variable: character
Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
ℹ Please use tidy evaluation idioms with `aes()`.
ℹ See also `vignette("ggplot2-in-packages")` for more information.
Class of the variable: integer

More than 100 distinct values

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 5000 rows containing non-finite outside the scale range
(`stat_bin()`).
Class of the variable: character
Class of the variable: Date

More than 100 distinct values

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 90657 rows containing non-finite outside the scale range
(`stat_bin()`).
Class of the variable: logical
Class of the variable: character
Class of the variable: character
Class of the variable: character
Class of the variable: character
Class of the variable: character

More than 100 distinct values

More than 100 distinct values

Class of the variable: character
Class of the variable: character
Class of the variable: character
Class of the variable: character
Class of the variable: logical
Class of the variable: integer
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 5000 rows containing non-finite outside the scale range
(`stat_bin()`).
Class of the variable: integer
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 5000 rows containing non-finite outside the scale range
(`stat_bin()`).
Class of the variable: character
Class of the variable: Date

More than 100 distinct values

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 5000 rows containing non-finite outside the scale range
(`stat_bin()`).
Class of the variable: character
Class of the variable: Date

More than 100 distinct values

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 14312 rows containing non-finite outside the scale range
(`stat_bin()`).
Class of the variable: character
Class of the variable: Date

More than 100 distinct values

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 90230 rows containing non-finite outside the scale range
(`stat_bin()`).
Class of the variable: integer
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 5000 rows containing non-finite outside the scale range
(`stat_bin()`).
Class of the variable: Date

More than 100 distinct values

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 91780 rows containing non-finite outside the scale range
(`stat_bin()`).
Class of the variable: logical

Compliance with the Common Data Model specification

We check whether the imported dataset complies with the data model specification (https://docs.google.com/spreadsheets/d/1Eva2ucg_M0WaDkCaF7qfBxk2DwTlUac9gKuP3xck4rw/edit#gid=0).

To comply with the data model, the dataset must pass a number of validation rules. The data are tested against this set of validation rules and results from this validation process are summarized.

Validation rule Name rule Items Passes Fails Percentage of fails Number of NAs Percentage of NAs Error Warning
is.na(sex_cd) | sex_cd %vin% c(“0”, “1”, “2”, “9”) V01 100000 100000 0 0% 0 0% FALSE FALSE
is.na(age_nm) | age_nm - 18 >= -1e-08 & age_nm - 115 <= 1e-08 V02 100000 85378 14622 14.62% 0 0% FALSE FALSE
is.na(age_cd) | age_cd %vin% c(“0-18”, “18-25”, “25-35”, “35-45”, “45-55”, “55-65”, “65-75”, “75-85”, “85-95”, “95-105”, “105-115”) V03 100000 100000 0 0% 0 0% FALSE FALSE
is.na(exitus_bl) | exitus_bl %vin% c(TRUE, FALSE) V04 100000 100000 0 0% 0 0% FALSE FALSE
is.na(education_level_cd) | education_level_cd %vin% c(“Low”, “Middle”, “High”) V05 100000 100000 0 0% 0 0% FALSE FALSE
is.na(income_category_cd) | income_category_cd %vin% c(“Low”, “Middle”, “High”) V06 100000 100000 0 0% 0 0% FALSE FALSE
is.na(migration_background_cd) | migration_background_cd %vin% c(“NATIVE”, “EU”, “NON-EU”, “PAR”) V07 100000 100000 0 0% 0 0% FALSE FALSE
is.na(household_type_cd) | household_type_cd %vin% c(“ALONE”, “COUPLE”, “COUPLE_CHILD”, “LONE”, “EXTENDED”, “OTHER”) V08 100000 100000 0 0% 0 0% FALSE FALSE
is.na(hospi_due_to_covid_bl) | hospi_due_to_covid_bl %vin% c(TRUE, FALSE) V09 100000 100000 0 0% 0 0% FALSE FALSE
is.na(test_positive_to_covid_nm) | test_positive_to_covid_nm - 0 >= -1e-08 & test_positive_to_covid_nm - 50 <= 1e-08 V10 100000 100000 0 0% 0 0% FALSE FALSE
is.na(test_nm) | test_nm - 0 >= -1e-08 & test_nm - 50 <= 1e-08 V11 100000 100000 0 0% 0 0% FALSE FALSE
is.na(dose_1_brand_cd) | dose_1_brand_cd %vin% c(“BP”, “MD”, “JJ”, “AZ”, “NV”) V12 100000 100000 0 0% 0 0% FALSE FALSE
is.na(dose_2_brand_cd) | dose_2_brand_cd %vin% c(“BP”, “MD”, “JJ”, “AZ”, “NV”) V13 100000 100000 0 0% 0 0% FALSE FALSE
is.na(dose_3_brand_cd) | dose_3_brand_cd %vin% c(“BP”, “MD”, “JJ”, “AZ”, “NV”) V14 100000 100000 0 0% 0 0% FALSE FALSE
is.na(doses_nm) | doses_nm - 0 >= -1e-08 & doses_nm - 10 <= 1e-08 V15 100000 100000 0 0% 0 0% FALSE FALSE
(is.na(dose_1_dt) & is.na(dose_2_dt)) | is.na(dose_2_dt) | !is.na(dose_1_dt) & !is.na(dose_2_dt) & (dose_1_dt < dose_2_dt) V16 100000 55008 44992 44.99% 0 0% FALSE FALSE
(is.na(dose_2_dt) & is.na(dose_3_dt)) | is.na(dose_3_dt) | !is.na(dose_2_dt) & !is.na(dose_3_dt) & (dose_2_dt < dose_3_dt) V17 100000 94807 5193 5.19% 0 0% FALSE FALSE
is.na(fully_vaccinated_dt) | is.na(exitus_dt) | !is.na(fully_vaccinated_dt) & !is.na(exitus_dt) & fully_vaccinated_dt <= exitus_dt V18 100000 99622 378 0.38% 0 0% FALSE FALSE
(!is.na(dose_1_dt) & !is.na(dose_2_dt) & !is.na(dose_3_dt) & doses_nm - 3 >= -1e-08) | (!is.na(dose_1_dt) & !is.na(dose_2_dt) & is.na(dose_3_dt) & abs(doses_nm - 2) <= 1e-08) | (!is.na(dose_1_dt) & is.na(dose_2_dt) & is.na(dose_3_dt) & abs(doses_nm - 1) <= 1e-08) | (is.na(dose_1_dt) & is.na(dose_2_dt) & is.na(dose_3_dt) & abs(doses_nm - 0) <= 1e-08) V19 100000 81383 13863 13.86% 4754 4.75% FALSE FALSE
is.na(dose_1_dt) | (!is.na(dose_1_dt) & !is.na(dose_1_brand_cd)) V20 100000 95242 4758 4.76% 0 0% FALSE FALSE
is.na(dose_2_dt) | (!is.na(dose_2_dt) & !is.na(dose_2_brand_cd) & !is.na(dose_1_dt) & !is.na(dose_1_brand_cd)) V21 100000 87752 12248 12.25% 0 0% FALSE FALSE
is.na(dose_3_dt) | (!is.na(dose_3_dt) & !is.na(dose_3_brand_cd) & !is.na(dose_2_dt) & !is.na(dose_2_brand_cd) & !is.na(dose_1_dt) & !is.na(dose_1_brand_cd)) V22 100000 97824 2176 2.18% 0 0% FALSE FALSE

The vertical bars in the validation plot indicate the percentage of records ‘Passing’, ‘Failing’ and ‘Missing’